Speaker-dependent multipitch tracking using deep neural networks

نویسندگان

Yuzhou Liu

DeLiang Wang

چکیده

Multipitch tracking is important for speech and signal processing. However, it is challenging to design an algorithm that achieves accurate pitch estimation and correct speaker assignment at the same time. In this paper, deep neural networks (DNNs) are used to model the probabilistic pitch states of two simultaneous speakers. To capture speaker-dependent information, two types of DNN with different training strategies are proposed. The first is trained for each speaker enrolled in the system (speaker-dependent DNN), and the second is trained for each speaker pair (speaker-pair-dependent DNN). Several extensions, including gender-pair-dependent DNNs, speaker adaptation of gender-pair-dependent DNNs and training with multiple energy ratios, are introduced later to relax constraints. A factorial hidden Markov model (FHMM) then integrates pitch probabilities and generates the most likely pitch tracks with a junction tree algorithm. Experiments show that the proposed methods substantially outperform other speaker-independent and speaker-dependent multipitch trackers on two-speaker mixtures. With multi-ratio training, the proposed methods achieve consistent performance at various energies ratios of the two speakers in a mixture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

The Use of Wavelets in Speaker Feature Tracking Identification System Using Neural Network

Continuous and Discrete Wavelet Transform (WT) are used to create text-dependent robust to noise speaker recognition system. In this paper we investigate the accuracy of identification the speaker identity in nonstationary signals. Three methods are used to extract the essential speaker features based on Continuous, Discrete Wavelet Transform and Power Spectrum Density (PSD). To have better ide...

متن کامل

Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines

This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build highorder eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speakerdependent RBMs with neural networks, expecting ...

متن کامل

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent elementwise affine transformations to canonicalize the internal feature representations at the output...

متن کامل

EM-Based Gain Adaptation for Probabilistic Multipitch Tracking

We introduce an EM algorithm for automatic speaker gain adaptation, and use this approach for probabilistic multipitch tracking. We derive a lower bound on the log-likelihood of the gain parameters and use a fast pruning method to make lower bound optimization efficient. We evaluate the performance of gain adapted multipitch tracking on the GRID database, where 3000 speech mixtures were generat...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

The Journal of the Acoustical Society of America

دوره 141 2 شماره

صفحات -

تاریخ انتشار 2015

Speaker-dependent multipitch tracking using deep neural networks

نویسندگان

چکیده

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

The Use of Wavelets in Speaker Feature Tracking Identification System Using Neural Network

Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines

Embedding-Based Speaker Adaptive Training of Deep Neural Networks

EM-Based Gain Adaptation for Probabilistic Multipitch Tracking

عنوان ژورنال:

اشتراک گذاری